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ABSTRACT 

This  paper  describes  the  Fifth  Data  Release  (DR5)  of  the  Sloan  Digital  Sky  Survey  (SDSS).  DR5  includes  all  survey 
quality  data  taken  through  2005  June  and  represents  the  completion  of  the  SDSS-I  project  (whose  successor,  SDSS-II, 
will  continue  through  mid-2008).  It  includes  five-band  photometric  data  for  2 1 7  million  objects  selected  over  8000  deg2 
and  1,048,960  spectra  of  galaxies,  quasars,  and  stars  selected  from  5713  deg2  of  that  imaging  data.  These  numbers 
represent  a  roughly  20%  increment  over  those  of  the  Fourth  Data  Release;  all  the  data  from  previous  data  releases  are 
included  in  the  present  release.  In  addition  to  “standard”  SDSS  observations,  DR5  includes  repeat  scans  of  the  southern 
equatorial  stripe,  imaging  scans  across  M3 1  and  the  core  of  the  Perseus  Cluster  of  galaxies,  and  the  first  spectroscopic 
data  from  SEGUE,  a  survey  to  explore  the  kinematics  and  chemical  evolution  of  the  Galaxy.  The  catalog  database 
incorporates  several  new  features,  including  photometric  redshifts  of  galaxies,  tables  of  matched  objects  in  overlap 
regions  of  the  imaging  survey,  and  tools  that  allow  precise  computations  of  survey  geometry  for  statistical  investigations. 
Subject  headings:  atlases  —  catalogs  —  surveys 
Online  material :  color  figure 


1.  INTRODUCTION 

The  primary  goals  ofthe  Sloan  Digital  Sky  Survey  (SDSS)  are 
a  large-area,  well-calibrated  imaging  survey  of  the  north  Galactic 
cap,  repeat  imaging  of  an  equatorial  stripe  in  the  south  Galactic 
cap  to  allow  variability  studies  and  deeper  co-added  imaging,  and 
spectroscopic  surveys  of  well-defined  samples  of  roughly  1 06  gal¬ 
axies  and  105  quasars  (York  et  al.  2000).  The  survey  uses  a  ded¬ 
icated,  wide-field,  2.5  m  telescope  (Gunn  et  al.  2006)  at  Apache 
Point  Observatory,  New  Mexico.  Imaging  is  carried  out  in  drift- 
scan  mode  using  a  142  megapixel  camera  (Gunn  et  al.  1998)  that 
gathers  data  in  five  broad  bands,  u,  g,  r,  i,  and  z,  spanning  the  range 
from  3000  to  10,000  A  (Fukugita  et  al.  1996),  with  an  effective 
exposure  time  of  54. 1  s  per  band.  The  images  are  processed  using 
specialized  software  (Lupton  et  al.  2001;  Stoughton  et  al.  2002; 
Lupton  2005)  and  are  astrometrically  (Pier  et  al.  2003)  and  pho¬ 
tometrically  (Hogg  et  al.  2001;  Tucker  et  al.  2006)  calibrated 
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using  observations  of  a  set  of  primary  standard  stars  (Smith  et  al. 
2002)  observed  on  a  neighboring  20  inch  (5 1  cm)  telescope. 

Objects  are  selected  from  the  imaging  data  for  spectroscopy 
using  a  variety  of  algorithms,  including  a  complete  sample  of  gal¬ 
axies  with  Petrosian  (1976)  r-magnitudes  brighter  than  17.77 
(Strauss  et  al.  2002),  a  deeper  sample  of  color-  and  magnitude- 
selected  luminous  red  galaxies  (LRGs)  from  redshift  0.15  to 
beyond  0.5  (Eisenstein  et  al.  2001),  a  color-selected  sample  of 
quasars  with  0  <  z  <  5.5  (Richards  et  al.  2002),  optical  coun¬ 
terparts  to  Rontgensatellit  X-ray  sources  (Anderson  et  al.  2003), 
and  a  variety  of  stellar  and  calibrating  objects  (Stoughton  et  al. 
2002;  Adelman-McCarthy  et  al.  2006).  These  targets  are  observed 
by  a  pair  of  double  spectrographs  fed  by  640  optical  fibers,  each  3" 
in  diameter,  plugged  into  aluminum  plates  2.98°  in  diameter.  The 
resulting  spectra  cover  the  wavelength  range  3800-9200  A  with  a 
resolution  of  1/AA  «  2000.  The  finite  size  of  the  fiber  clad¬ 
ding  means  that  only  one  of  two  objects  closer  than  55"  can  be 
targeted  on  a  given  plate;  this  restriction  results  in  a  roughly  10% 
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incompleteness  in  galaxy  spectroscopy,  but  this  incompleteness 
is  well  characterized  and  is  generally  straightforward  to  correct 
for  in  statistical  calculations  (e.g.,  Zehavi  et  al.  2002). 

This  paper  presents  the  Fifth  Data  Release  ( DR5)  of  the  SDSS, 
which  follows  the  Early  Data  Release  ( EDR)  of  commissioning 
data  (Stoughton  et  al.  2002)  and  the  regular  data  releases  DR1- 
DR4  (Abazajian  et  al.  2003,  2004,  2005;  Adelman-McCarthy 
et  al.  2006).  These  data  releases  are  cumulative,  so  all  observa¬ 
tions  in  the  earlier  releases  are  also  included  in  DR5.  There  have 
been  no  substantive  changes  to  the  imaging  or  spectroscopic  soft¬ 
ware  since  DR2,  so  DR5  includes  data  identical  to  DR2-DR4  in 
the  overlapping  regions.  Finkbeiner  et  al.  (2004)  presented  a  sep¬ 
arate  (“Orion”)  release  of  imaging  data  outside  the  formal  SDSS 
footprint,  mostly  at  low  Galactic  latitudes. 

DR5  includes  all  survey  quality  data  that  were  taken  as  part  of 
“SDSS-I,”  the  phase  of  the  SDSS  that  ran  through  2005  June, 
including  a  variety  of  imaging  scans  and  spectroscopic  observations 
taken  outside  of  the  standard  survey  footprint  or  with  nonstandard 
spectroscopic  target  selection.  The  second  “SDSS-1I”  phase,  which 
includes  a  number  of  new  participating  institutions  and  will  con¬ 
tinue  through  mid-2008,  consists  of  three  distinct  surveys:  the  Sloan 
Legacy  Survey,  the  Sloan  Supernova  Survey,  and  the  Sloan  Ex¬ 
tension  for  Galactic  Understanding  and  Exploration  (SEGUE). 
The  Legacy  Survey  is  essentially  a  continuation  of  SDSS-I,  with 
the  goal  of  completing  imaging  and  spectroscopy  over  about 
8000  deg2  of  the  north  Galactic  cap.  The  Supernova  Survey 
(J.  Frieman  et  al.  2007,  in  preparation)  repeatedly  scans  a  300  deg2 
area  in  the  south  Galactic  cap  during  the  fall  months  to  detect  and 
measure  time-variable  objects,  especially  Type  la  supemovae  (out 
to  z  «  0.4)  that  can  be  used  to  measure  the  cosmic  expansion  his¬ 
tory.  SEGUE  includes  3500  deg2  of  new  imaging,  mostly  at  Ga¬ 
lactic  latitudes  below  those  of  the  original  SDSS  footprint,  and 
spectroscopy  of  about  240,000  selected  stellar  targets  to  study  the 
structure,  chemical  evolution,  and  stellar  content  of  the  Milky  Way. 
Future  SDSS  data  releases  will  include  data  from  all  three  surveys, 
and  some  early  data  from  SEGUE  are  included  in  DR5.  An  initial 
release  of  imaging  data  and  uncalibrated  object  catalogs  from  the 
autumn  2005  season  of  the  Supernova  Survey  is  available  online,74 
but  it  is  not  part  of  DR5. 

Section  2  of  this  paper  describes  the  contents  of  DR5,  and  §  3 
summarizes  information  about  data  quality,  including  new  tests  of 
spectrophotometric  accuracy.  Section  4  describes  several  new  fea¬ 
tures  of  DR5:  photometric  redshifts  for  galaxies,  “sector/region” 
tables  for  precisely  defining  the  survey  geometry,  and  tools  for 
matching  repeat  observations  of  the  same  objects.  We  conclude 
in  §  5. 

2.  WHAT  IS  INCLUDED  IN  DR5 

As  described  by  Stoughton  et  al.  (2002),  public  SDSS  data  are 
available  both  as  flat  files  (from  the  Data  Archive  Server  [DAS]) 
and  via  a  flexible  Web  interface  to  the  SDSS  database  (the  Cat¬ 
alog  Archive  Server  [CAS]).  Information  about  and  entry  points 
to  both  interfaces  can  be  found  online.75  The  CAS  is  a  conve¬ 
nient  and  powerful  tool  for  selecting  objects  found  in  the  SDSS 
based  on  their  location,  photometric  parameters,  and  (if  they  were 
observed  spectroscopically)  spectroscopic  parameters.  FITS  im¬ 
ages  and  spectra  for  individual  objects  and  fields  are  available  from 
the  CAS;  the  DAS  should  be  used  for  bulk  downloads  of  large 
quantities  of  data.  Links  to  extensive  documentation  and  examples 
are  available  on  the  Web  site  mentioned  in  the  above  footnote. 


74  See  http://www.sdss.org/drsnl/DRSNl_data_release.html. 

75  See  http://www.sdss.org/dr5. 


The  principal  SDSS  imaging  data  are  taken  along  a  series  of 
great-circle  stripes  that  aim  to  fill  a  contiguous  area  in  the  north 
Galactic  cap  and  along  three  noncontiguous  stripes  in  the  south 
Galactic  cap.  Each  filled  stripe  consists  of  two  interleaved  strips 
because  of  the  gaps  between  columns  of  CCDs  in  the  imaging 
camera  (see  Gunn  et  al.  1998;  York  et  al.  2000).  Figure  1  shows 
the  region  of  sky  included  in  DR5  in  imaging  (top)  and  spectros¬ 
copy  (bottom).  In  contrast  to  DR4,  the  imaging  available  in  DR5 
covers  an  essentially  contiguous  region  of  the  north  Galactic  cap, 
with  a  few  small  patches  totaling  ~200  deg2  remaining  (nearly 
all  of  this  area  will  be  included  in  DR6).  The  area  covered  by  the 
DR5  primary  imaging  survey  (including  the  southern  stripes  but 
not  counting  these  patches)  is  8000  deg2.  The  great-circle  stripes 
in  the  north  overlap  at  the  poles  of  the  survey;  21%  of  this  region 
of  sky  is  covered  more  than  once.  In  any  region  where  imaging 
runs  overlap,  one  run  is  declared  primary  and  used  for  spectroscopic 
target  selection,  and  other  runs  are  declared  secondary.  DR5  in¬ 
cludes  both  the  primary  and  secondary  (repeat)  observations  of 
each  area  and  source  (see  §  4.3). 

As  spectroscopic  observations  necessarily  lag  the  imaging,  the 
DR5  spectroscopic  area  still  has  the  gap  at  intermediate  declina¬ 
tions  that  was  present  in  the  DR4  imaging  coverage.  The  area 
covered  by  the  spectroscopic  survey  is  5713  deg2.  The  spectro¬ 
scopic  data  include  1,048,960  spectra,  arrayed  on  1639  plates  of 
640  fibers  each.  Thirty-two  fibers  per  plate  are  devoted  to  measure¬ 
ments  of  sky.  Automated  spectral  classification  yields  approxi¬ 
mately  675,000  galaxies,  90,000  quasars,  and  216,000  stars.  Nearly 
99%  of  all  spectra  are  of  high  enough  quality  to  yield  an  unam¬ 
biguous  classification  and  redshift;  most  of  the  unidentified  targets 
are  either  faint  (r  >  20)  or  have  featureless  spectra  (hot  stars  or 
blazar-like  active  galactic  nuclei;  see  Collinge  et  al.  2005).  How¬ 
ever,  in  rare  cases  the  assigned  redshift  is  far  from  the  true  red- 
shift;  so  for  an  object  with  unusual  properties  it  is  important  to 
examine  the  spectra  and  to  check  for  flags  that  can  indicate  data 
quality  or  classification  problems.  As  described  in  the  DR4  paper 
(Adelman-McCarthy  et  al.  2006),  a  number  of  plates  have  dupli¬ 
cate  observations,  usually  just  one  but  in  some  cases  several.  DR5 
includes  62  duplicates  of  53  unique  main  survey  plates  and  10  du¬ 
plicates  of  special  plates  which  take  spectra  outside  the  standard 
survey  target  selection.  Some  main-survey  objects  are  also  re¬ 
observed  on  adjacent  plates  to  check  the  end-to-end  reproducibility 
of  spectroscopy.  In  total,  about  2%  of  main-survey  objects  have 
one  or  more  repeat  spectra. 

In  the  fall  months,  when  the  southern  Galactic  cap  is  visible  in 
the  northern  hemisphere,  the  SDSS  imaging  has  been  confined  to 
a  stripe  along  the  celestial  equator,  plus  two  “outrigger”  stripes, 
centered  roughly  at  6  =  +15°  and  —10°,  respectively  (these  are 
visible  on  the  right-hand  side  of  the  panels  of  Fig.  1).  We  have 
performed  multiple  imaging  passes  of  the  southern  equatorial 
stripe  (stripe  82,  spanning  22h20m  <  a  <  3h20m,  -1.25°  <  6  < 
+ 1 .25,  in  J2000.0  coordinates),  which  can  be  used  for  variability 
studies  and  for  co-addition  to  create  deeper  summed  images.  Pre¬ 
vious  data  releases  have  included  only  a  single  epoch  of  these 
observations.  In  DR5,  we  make  available  36  runs  on  the  northern 
strip  of  this  stripe  and  29  runs  on  the  southern  strip;  these  are  all 
the  observations  of  stripe  82  carried  out  before  2005  July  that  are 
of  survey  quality.  Each  individual  run  covers  only  part  of  the  frill 
right  ascension  range  of  the  stripe;  Figure  2  shows  the  number  of 
passes  available  along  the  northern  and  southern  strips,  as  a  func¬ 
tion  of  right  ascension.  The  central  regions  of  the  stripe  have  typ¬ 
ically  been  covered  10-20  times.  The  extra  runs  are  available  in 
DR5  only  through  the  DR  supplemental  DAS.76  In  future  data 

76  Described  at  http://www.sdss.org/dr5/start/aboutdr5sup.html. 
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Fig.  1 . — Distribution  on  the  sky  of  SDSS  imaging  {top)  and  spectroscopy  {bottom)  included  in  DR5,  shown  in  J2000.0  equatorial  coordinates.  The  regions  of  sky  that  are 
new  to  DR5  are  shaded  more  lightly.  The  upper  panel  includes  both  those  regions  included  in  the  CAS  (totaling  8000  deg2)  and  the  supplementary  imaging  runs  available  only 
through  the  DAS,  which  consist  of  SEGUE  scans  at  low  Galactic  latitude  and  scans  through  M3 1  and  the  Perseus  Cluster. 


releases,  they  will  be  made  available  through  the  CAS  as  well. 
Note  that  DR5  does  not  include  those  runs  on  stripe  82  at  larger 
right  ascension,  in  the  region  of  Orion,  as  described  by  Finkbeiner 
et  al.  (2004).  Those  runs  continue  to  be  made  available  through  the 
Web  sites  indicated  in  that  paper. 

A  combined,  deep  image  of  the  full  equatorial  stripe  is  being 
prepared  and  will  be  made  available  in  a  future  data  release.  How¬ 
ever,  for  objects  that  can  be  detected  in  a  single  pass,  the  benefits 
of  co-addition  can  mostly  be  realized  simply  by  averaging  the 
photometric  measurements  from  the  multiple  passes,  using  the 
multiple  entries  in  the  photometric  catalog  rather  than  analyzing 
a  summed  image.  Figure  3,  based  on  the  stripe  82  stellar  catalog 
of  Ivezic  et  al.  (2007),  demonstrates  this  improvement,  showing 
the  g  —  r  versus  u  —  g  color-color  diagram  for  blue,  nonvariable 
point  sources  (mostly  white  dwarfs)  in  stripe  82.  Data  co-added 
at  the  catalog  level  have  been  used  to  search  for  faint  quasars 
(Jiang  et  al.  2006),  to  measure  the  dispersion  in  galaxy  colors  on 
the  red  sequence  (Cool  et  al.  2006),  and  to  improve  the  signal-to- 


noise  ratio  (S/N )  of  galaxy  //-band  Petrosian  magnitudes  ( Baldry 
et  al.  2005).  The  stripe  82  data  have  also  been  used  to  search  for 
variable  and  high  proper  motion  objects  (e.g.,  Ivezic  et  al.  2003) 
and  to  test  the  covariance  of  photometric  errors  among  bands  and 
among  multiple  objects  in  the  same  fields  (Scranton  et  al.  2007). 
Because  the  catalogs  from  the  multiple  stripe  82  scans  are  not  yet 
available  in  the  CAS,  averaging  or  variability  searches  must  be 
done  by  downloading  object  tables  from  the  DAS  and  identifying 
repeat  observations  of  the  same  object  by  positional  matching. 

In  addition  to  the  repeat  scans  on  stripe  82,  several  imaging 
runs  outside  of  the  standard  footprint  are  included: 

1.  Two  runs  that  together  make  a  2.5°  stripe  crossing  M31, 
the  Andromeda  Galaxy.  These  imaging  data  have  been  used  to 
search  for  substructure  in  M31’s  halo  (e.g.,  Zucker  et  al.  2004a, 
2004b). 

2.  Five  runs  that  together  cover  78  deg2  centered  roughly  on 
the  low-redshift  Perseus  Cluster  of  galaxies. 
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Fig.  2. — Coverage  of  the  southern  equatorial  stripe  in  DR5.  Solid  and  dotted 
lines  show  the  number  of  photometric  runs  covering  regions  of  different  right 
ascension  for  the  northern  and  southern  strips,  respectively. 


3 .  Ten  runs  of  imaging  data  taken  as  part  of  the  SEGUE  sur¬ 
vey,  including  stripes  at  /  =  50°  (—46°  <  b  <  —8°),  l  =  110° 
(-36°  <  b  <  29.5°),  and  /  =  130°  (-49  <b<  -18.6),  and  a 
stripe  that  runs  for  20°  along  8  «  25°. 

As  with  the  repeat  scans  of  Stripe  82,  objects  detected  in  these 
runs  are  recorded  in  the  DR  supplemental  DAS,77  but  they  are 
not,  as  yet,  available  in  the  CAS.  All  these  runs  are  in  quite 
crowded  fields,  as  they  tend  to  go  to  low  Galactic  latitude  or  pass 
through  the  center  of  M3 1 .  The  completeness  and  accuracy  of 
the  photometry  produced  by  the  automated  SDSS  pipeline  be¬ 
comes  suspect  in  crowded  fields,  so  these  data  should  be  used 
with  care.  Plots  and  tables  of  the  field-by-field  data  quality  for 
these  runs  may  be  accessed  online.78 

Because  of  the  relatively  small  footprint  of  the  imaging  in  the 
southern  Galactic  cap,  the  spectroscopy  of  targets  selected  by 
our  normal  algorithms  was  completed  quite  early  in  the  survey; 
most  of  these  data  were  included  already  in  DR1.  We  generally 
restrict  imaging  observations  to  pristine  conditions,  when  the  moon 
is  below  the  horizon,  the  sky  is  cloudless,  and  the  seeing  is  good. 
To  make  optimal  use  of  the  remaining  time,  we  undertook  a 
series  of  spectroscopic  observing  programs,  based  mostly  on  the 
imaging  data  of  the  equatorial  stripe  in  the  southern  Galactic  cap, 
designed  to  go  beyond  the  science  goals  of  the  main  survey.  DR5 
includes  299  plates  from  these  programs,  carried  out  in  the  fall 
months  of 2001-2004,  with  a  total  of 204,160  spectra.  The  great 
majority  of  these  plates  were  already  included  in  DR4;  the  tar¬ 
get  selection  for  them  is  described  in  the  DR4  paper  (Adelman- 
McCarthy  et  al.  2006),  and  we  will  not  repeat  it  here.  The  science 
objectives  include  studies  of  galactic  kinematics,  calibration  of 
photometric  redshifts,  evaluation  of  the  completeness  of  the  qua¬ 
sar  survey  ( Vanden  Berk  et  al.  2005),  and  surveys  of  galaxies  that 

77  See  http://www.sdss.org/dr5/start/aboutdrsup.html. 

78  See http://das.sdss.org/DRsup/data/imaging/QA/summaryQA_analyzePC.html. 


u-g 


u-g 

Fig.  3. — The  g  —  r  vs.  u  —  g  color-color  diagram  for  the  blue,  nonvariable 
point  sources  with  u  <  20  in  the  equatorial  stripe  (from  Ivezic  et  al.  2007).  The 
top  panel  shows  results  using  single-epoch  DR5  photometry,  while  the  bot¬ 
tom  panel  shows  the  striking  improvement  obtained  by  averaging  the  photo¬ 
metric  measurements  from  all  of  the  imaging  passes,  allowing  clear  separation 
between  the  sequences  of  helium  white  dwarfs  (the  top  side  of  the  "triangle”) 
and  hydrogen  white  dwarfs  (which  lie  along  the  other  two  sides).  This  region  of 
color  space  also  includes  white  dwarf-M  dwarf  pairs,  hot  subdwarfs,  and  qua¬ 
sars  (see,  e.g.,  the  discussion  of  Eisenstein  et  al.  2006).  Main-sequence  and  red 
giant  stars  (far  more  numerous,  of  course)  are  mostly  off  the  diagram  to  the  upper 
right. 

fall  outside  of  the  standard  survey  selection  criteria  (Baldry  et  al. 
2005). 

DR5  includes  a  total  of  84  special  plates  that  were  not  included 
in  DR4.  All  of  these  were  obtained  as  early  data  of  the  SEGUE 
program.  Each  SEGUE  pointing  includes  two  640-fiber  plates 
of  different  exposure  times,  with  592  brighter  (13  <  g  <  18)  and 
560  fainter  (18  <  g  <  20)  stars  targeted.  The  remaining  targets 
are  calibration  standards  and  sky  fibers.  Target  selection  algorithms, 
which  are  outlined  in  Adelman-McCarthy  et  al.  (2006)  and  will  be 
described  more  fully  in  a  future  paper,  identify  candidate  stars  in  the 
following  categories:  white  dwarfs  (25  per  pointing),  cool  white 
dwarfs  (10),  above/blue  horizontal  branch  stars  (150),  F  turnoff  and 
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TABLE  1 

Characteristics  of  the  DR5  Imaging  Survey 


Parameter 


Value 


Footprint  area . 

Imaging  catalog . 

AB  magnitude  limits:3 

u . 

9 . 

r . 

i . 

z . 

Median  PSF  width . 

rms  photometric  calibration  errors: 
r . 

w-g . 

9~ r . 

r—i . 

i—z . 

Astrometry  errors . 

Object  counts :b 

Stars,  primary . 

Stars,  secondary . 

Galaxies,  primary . 

Galaxies,  secondary . 


8000  deg2  (20%  increment  over  DR4) 
217  million  unique  objects 

22.0  mag 

22.2  mag 

22.2  mag 

21.3  mag 
20.5  mag 
1.4"  in  r 

2% 

3% 

2% 

2% 

3% 

<0.1"  rms  absolute  per  coordinate 

85,383,971 

28,201,858 

131,721,365 

33,044,047 


3  The  95%  completeness  for  point  sources  in  typical  seeing;  50%  completeness 
numbers  are  generally  0.4  mag  fainter.  The  difference  between  asinh  magnitudes 
and  conventional  magnitudes  is  0.004-0.015  at  the  95%  limits  and  0.008-0.03  at 
the  50%  limits,  smaller  than  the  uncertainty  in  conversion  of  magnitudes  between 
surveys  used  to  estimate  the  completeness. 

b  Primary  imaging  objects  are  those  in  the  primary  imaging  area;  secondary 
objects  are  in  repeat  imaging,  so  they  are  typically  repeats  of  primary  objects. 


subdwarf  stars  (150),  G  stars  (375),  K  giants  (100),  low-metallicity 
candidates  (150),  K  dwarfs  (125),  M  dwarfs  (50),  and  asymptotic 
giant  branch  candidates  (10).  These  plates  are  listed  and  described 
online.79 

Tables  1  and  2  summarize  the  characteristics  of  the  DR5 
imaging  and  spectroscopic  surveys,  respectively.  Note  that  the 
“star”  and  “galaxy”  divisions  in  Table  1  refer  to  the  photometric 
pipeline  classifications;  stars  include  quasars  and  any  other  un¬ 
resolved  sources,  and  galaxies  are  all  resolved  objects,  includ¬ 
ing  airplane  and  satellite  trails,  etc.  Classifications  in  Table  2 
are  those  returned  by  the  spectroscopic  pipeline;  note,  in  par¬ 
ticular,  that  the  “quasar”  classification  (based  on  the  presence 
of  a  securely  detected,  high-excitation  emission  line  with  FWHM 
broader  than  1000  km  s  ')  does  not  include  any  explicit  lumi¬ 
nosity  cut. 

DR5  contains  several  QSO-related  tables  and  views.  The 
QuasarCatalog  table  lists  the  individually  inspected,  luminosity- 
and  line-width-restricted,  bona  fide  quasars  from  the  DR3  sample 
as  published  by  Schneider  et  al.  (2005).  A  similar  catalog  is  now 
being  created  for  DR5  (Schneider  et  al.  2007).  The  QSOBunch 
table  contains  a  record  for  each  “object”  flagged  as  a  potential 
QSO  in  any  of  three  catalog  tables:  Target .  PhotoObj  All, 
Best .  PhotoObj  All,  or  SpecObj.  In  such  cases  a  bunch  record 
describing  the  primary  photo,  target,  and  spectroscopic  objects 
within  1.5"  of  that  object  is  created.  Identifiers  of  nearby  ob¬ 
jects  from  each  catalog  are  combined  into  QSOConcordanceAll 
records  that  point  to  the  QSOBunch  record.  Those  identifiers  in 
turn  point  to  the  QSObest,  QSOtarget,  and  QSOspec  tables  that 
carry  more  detailed  information  about  each  object.  Thus,  the 
QuasarCatalog  table  provides  straightforward  access  to  a  set 
of  carefully  vetted  quasars  with  well-defined  selection  criteria, 
while  the  QSOConcordanceAll  table  can  be  used  to  identify 


79  See  http://www.sdss.org/dr5/products/spectra/special.html. 


TABLE  2 

Characteristics  of  the  DR5  Spectroscopic  Survey 


Parameter 

Value 

Main  Survey 

Footprint  area . 

5713  deg2  (19%  increment  over  DR4) 

Wavelength  coverage . 

3800-9200  A 

Resolution  2/A2 . 

1800-2100 

S/Na . 

>4  pixel  1  at  g  =  20.2 

Wavelength  calibration  errors . 

<5  km  s_1 

Redshift  accuracy . 

30  km  s-1  rms  for  main  galaxies;  ~99%  of  classifications  and  redshifts  are  reliable 

Number  of  plates . 

1639 

Number  of  spectra5 

1,048,960 

Galaxies . 

674,741 

Science  primary  galaxies . 

561,530 

Quasars . 

90,596 

Science  primary  quasars . 

75,005 

Stars . 

215,781 

Sky . 

55,555 

Unclassifiable . 

12,287 

Additional  Spectroscopy 

Repeat  of  main  survey  plates . 

62  plates 

SEGUE  and  SEGUE  test  plates . 

80  plates  (2  repeated) 

Other  southern  programs . 

219  plates  (8  repeated) 

3  Pixel  size  is  69  km  s-1,  varying  from  0.9  A  (blue  end)  to  2.1  A  (red  end). 

b  Science  primary  objects  define  the  set  of  unique  science  spectra  of  objects  from  main-survey  plates  (i.e.,  they  exclude  repeat 
observations,  sky  fibers,  spectrophotometric  standards,  and  objects  from  special  plates). 
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Fig.  4. — Distribution  of  image  quality  ( FWHM  of  point  sources)  in  the  imaging 
survey,  measured  in  r  band. 

all  objects  that  were  flagged  as  potential  quasars  based  on  pho¬ 
tometry  and/or  spectroscopy. 

3.  DATA  QUALITY 

SDSS  imaging  data  are  obtained  under  photometric  condi¬ 
tions,  as  determined  by  observations  from  the  0.5  m  photometric 
monitoring  telescope  and  a  10  /jm  “cloud  camera”  (Hogg  et  al. 
2001;  Tucker  et  al.  2006).  The  median  seeing  of  the  imaging  data 
is  1 .4"  in  the  r  band,  and  essentially  all  imaging  data  accepted  as 
survey  quality  have  seeing  better  than  2"  (see  Fig.  4).  The  95% 
completeness  limit  for  detection  of  point  sources  in  the  r  band  is 
22.2  mag,  estimated  from  comparison  to  deeper  surveys  (Clas¬ 
sifying  Objects  by  Medium-Band  Observations  [COMBO- 17] 
and  Canadian  Network  for  Observational  Cosmology  2  [CNOC2]). 
Constancy  of  stellar  population  colors  shows  that  photometric 
calibration  over  the  survey  area  is  accurate  to  roughly  0.02  mag 
in  the  g,  r,  and  i  bands,  and  0.03  mag  in  u  and  z  (Ivezic  et  al. 
2004).  Analysis  of  multiple  observations  of  the  southern  equa¬ 
torial  stripe  shows  that  photometry  of  bright  stars  is  repeatable  at 
better  than  0.01  mag  in  all  bands  and  that  the  photometric  pipe¬ 
line  correctly  estimates  random  photometric  errors  (Ivezic  et  al. 
2007).  All  magnitudes  are  roughly  on  an  AB  system  (Oke  & 
Gunn  1983)  and  use  the  “asinh”  scale  described  by  Lupton  et  al. 
(1999).  The  astrometric  calibration  precision  is  better  than  0.1"  rms 
per  coordinate  (Pier  et  al.  2003). 

The  wavelength  calibration  uncertainty  for  SDSS  spectra  is 
roughly  0.05  A.  Note  that  spectra  in  DR5  (and  DR2-DR4)  are 
not  corrected  for  Galactic  extinction;  this  is  a  change  relative  to 
DR1 .  The  spectra  are  flux-calibrated  using  observations  of  F  sub¬ 
dwarfs,  which  are  targeted  for  this  purpose  on  each  spectroscopic 
plate;  the  calibration  procedure  is  described  in  §  4. 1  of  Abazajian 
et  al.  (2004).  Wilhite  et  al.  (2005)  discuss  the  repeatability  of 
stellar  spectra  taken  more  than  50  days  apart.  Their  Figure  4 
shows  that  the  distribution  of  the  fractional  difference  from  one 
observation  to  another  in  the  flux  summed  over  all  pixels  in  non¬ 
variable  stars  has  a  68%  full  width  of  ^5%-8%,  depending  on 
S/N.  Their  Figure  5  shows  that  the  typical  offset  in  the  cali¬ 


Fig.  5. — Test  of  spectrophotometric  accuracy,  performed  by  dividing  the  rest- 
frame  spectra  of  elliptical  galaxies  observed  over  the  redshift  range  0.04  <  z  < 
0.2  (see  text).  Points  show  the  residual  inferred  from  160  redshift-bin  spectra 
(each  an  average  of 300-1000  individual  galaxies)  spaced  by  Az  =  0.01,  and  the 
central  line  shows  the  median  residual.  [See  the  electronic  edition  of  the  Supplement 
for  a  color  version  of  this  figure .] 

bration  between  two  epochs  of  a  single  plate  is  l%-3%  over  the 
full  observed  wavelength  range,  with  no  strong  features  at  any 
wavelength. 

A  useful  way  to  test  the  quality  of  spectrophotometry  on  small 
scales  (<500  A)  is  to  observe  a  population  of  identical  objects  at 
a  range  of  redshifts.  Spectrophotometric  residuals  may  then  be 
computed  by  dividing  the  rest-frame  spectra  of  objects  in  differ¬ 
ent  redshift  bins.  While  no  ideal  population  of  identical  objects 
exists,  elliptical  galaxies  have  spectra  that  are  similar,  on  average, 
over  the  redshift  range  z  =  0.04-0.20,  since  they  are  no  longer 
forming  stars. 

We  select  elliptical  galaxies  for  this  test  using  their  position  in 
the  color-magnitude  diagram,  with  an  additional  cut  on  the  Ha 
equivalent  width  of  2  A  to  exclude  any  objects  with  ongoing  star 
formation.  We  average  300-1000  spectra  in  the  rest  frame  in 
160  bins  of  0.001  in  redshift  from  z  =  0.04  to  0.20.  To  determine 
the  spectrophotometry  residuals,  we  must  fit  any  evolution  with 
redshift,  which  can  arise  from  a  combination  of  true  passive  evo¬ 
lution,  slight  changes  in  sample  selection,  and  aperture  effects. 
This  is  done  by  fitting  a  fourth-order  polynomial  to  the  flux  as 
function  of  redshift  for  each  rest-frame  wavelength.  We  divide 
the  rest-frame  spectra  by  these  fits  and  interpolate  back  to  the 
observed  frame.  The  median  of  the  residual  spectra  in  the  observed 
frame  provides  a  measure  of  the  spectrophotometry  error,  i.e.,  the 
mean  factor  by  which  the  flux-calibrated  spectrum  provided  by 
the  spectroscopic  pipeline  is  high  or  low  compared  to  a  perfectly 
calibrated  spectrum.  Since  the  evolutionary  fits  are  themselves 
affected  by  the  spectrophotometry  errors,  we  apply  the  estimated 
correction  to  the  averaged  spectra  and  iterate  the  process,  which 
converges  rapidly. 

Figure  5  shows  both  the  spectrophotometry  residuals  inferred 
from  each  of  the  160  composite  spectra  and  the  median  of  these 
residuals.  There  are  sharp  features  associated  with  calcium  and 
sodium  absorption,  probably  originating  in  the  Galactic  interstellar 
medium,  and  with  night-sky  emission  lines.  The  most  worrisome 
features  are  the  wiggles  below  4500  A,  with  amplitude  of  ~3%, 
centered  on  Ca  H  and  K,  H<5,  and  Hy.  The  coincidence  of  these 
wiggles  with  known  spectral  features  suggests  that  these  residuals 
are  caused  by  a  systematic  mismatch  between  the  spectrophoto¬ 
metric  standard  stars  and  the  model  F  stars  used  in  the  calibration 
pipeline. 
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One  obvious  question  is  the  scale  at  which  we  can  measure 
spectrophotometry  errors  with  this  technique.  This  scale  is  set  by 
our  ability  to  discriminate  evolution  effects  from  the  spectropho¬ 
tometry  residuals,  which  in  turn  is  related  to  the  wavelength  shift 
between  our  high-  and  low-redshift  bins.  We  have  tested  the  tech¬ 
nique  empirically  by  adding  sine  and  cosine  modulations  with 
different  periods  to  the  observed  frame  and  seeing  how  well  we 
recover  them.  Residuals  seem  to  be  well  measured  on  scales  less 
than  500  A,  i.e..  Figure  5  should  reveal  any  systematic  errors  in 
SDSS  photometry  with  periods  shorter  than  this.  On  larger  scales, 
we  must  rely  on  the  F  star  spectral  models,  on  tests  against  white 
dwarf  model  spectra  (see  Fig.  4  of  Abazajian  et  al.  2004),  and  on 
checks  of  synthesized  magnitudes  against  the  photometry.  Col¬ 
lectively,  these  tests  imply  that  the  flux-calibrated  SDSS  spectra 
can  be  used  for  spectrophotometry  at  the  few  percent  level. 

4.  NEW  FEATURES  OF  DR5 
4.1.  Photometric  Redshifts  for  Galaxies 

DR5  includes  two  estimates  of  photometric  redshifts  for  gal¬ 
axies,  calculated  with  two  independent  techniques.80  The  first 
uses  the  template-fitting  algorithm  described  by  Csabai  et  al.  (2003), 
which  compares  the  expected  colors  of  a  galaxy  (derived  from 
template  spectral  energy  distributions)  with  those  observed  for 
an  individual  galaxy.  A  common  approach  for  template  fitting  is 
to  take  a  small  number  of  spectral  templates  T  (e.g.,  E,  Sbc,  Scd, 
and  Irr  galaxies)  and  choose  the  best  fit  by  optimizing  the  like¬ 
lihood  of  the  fit  as  a  function  of  redshifi,  type,  and  luminosity, 
p(z,  T,  L).  We  use  a  variant  of  this  method  that  incorporates  a 
continuous  distribution  of  spectral  templates,  enabling  the  error 
function  in  redshifi  and  type  to  be  well  defined.  Since  a  represen¬ 
tative  set  of  photometrically  calibrated  spectra  in  the  full  wavelength 
range  of  the  filters  is  not  easy  to  obtain,  we  have  started  from  the 
empirical  templates  of  Coleman  et  al.  (1980),  extended  them  with 
spectral  synthesis  models,  and  adjusted  them  to  fit  the  colors  of 
galaxies  in  the  training  set  ( Budavari  et  al.  2000).  The  results  are 
listed  in  the  CAS  table  Photoz,  which  includes  the  estimate  of 
the  redshifi,  spectral  type,  rest-frame  colors,  rest-frame  absolute 
magnitudes,  errors  on  all  of  these  quantities,  and  a  quality  flag. 
All  photometric  objects  have  an  entry  in  the  PhotoZ  table,  re¬ 
gardless  of  whether  they  are  photometrically  classified  as  galaxies 
or  stars,  so  it  is  essential  to  consult  the  quality  flag  and  error  char¬ 
acterizations  when  using  the  photometric  redshifts. 

The  second  photometric  redshift  estimate  uses  a  neural  network 
method  that  is  very  similar  in  implementation  to  that  of  Collister  & 
Lahav  (2004).  The  training  set  consists  of  140,000  single-pass 
SDSS  photometry  measurements  with  spectroscopic  redshifts  from 
various  sources:  the  SDSS  (1 10,000  redshifts),  CNOC2  ( Yee  et  al. 
2000;  9000  redshifts),  Canada  France  Redshift  Survey  (Lilly 
etal.  1995;  1000  redshifts),  Deep  Extragalactic  Evolutionary  Probe 
(DEEP)  and  DEEP2  (Weiner  et  al.  2005;  1700  redshifts),  Team 
Keck  Redshift  Survey/Great  Observatories  Origins  Deep  Survey 
( Wirth  et  al.  2004;  300  redshifts),  and  the  2SLAQ  LRG  surveys 
(Cannon  et  al.  2006;  27,000  redshifts).  The  SDSS  portion  of  the 
training  set  consists  of  a  representative  sampling  of  the  SDSS  main, 
LRG,  and  southern  survey  spectroscopic  data;  the  other  surveys  are 
used  to  augment  the  training  set  at  magnitudes  fainter  than  probed 
by  the  SDSS  spectroscopic  samples.  Note  that  the  training  set  mul¬ 
tiply  counts  independent,  repeat  SDSS  photometric  measurements 
of  the  same  objects,  in  particular  on  SDSS  stripe  82.  Photometric 
redshift  errors  are  computed  using  the  nearest  neighbor  error 

80  See  http://skyserver.elte.hu/PhotoZ/  and http://yummy.uchicago.edu/SDSS/ 
for  details. 


method,  which  assigns  to  each  object  an  error  based  on  the  pho¬ 
tometric  redshift  error  distribution  of  objects  with  similar  mag¬ 
nitude  and  color  in  the  training  set  (for  which  the  true  redshifts 
are  known),  and  this  approach  is  found  to  accurately  predict  the 
errors  (El.  Oyaizu  et  al.  2007,  in  preparation).  The  trained  network 
is  tested  on  a  larger  validation  set  consisting  of  1 ,700,000  objects 
with  SDSS  photometry  (counting  independent  repeat  measure¬ 
ments)  and  for  which  spectroscopic  redshifts  are  available.  The  in¬ 
put  catalogs  for  these  photometric  redshift  measurements  were 
derived  from  the  SDSS  photo  pipeline  outputs,  but  with  a  few 
additional  cuts  employed  to  improve  the  star-galaxy  separation 
and  using  the  point-spread  function  (PSF)  probability  and  the 
lensing  smear  polarizability  (Sheldon  et  al.  2004).  The  photo¬ 
metric  sample  was  cut  at  a  galaxy  probability  greater  than  0.8, 
which  is  very  stringent,  and  a  smear  polarizability  less  than  0.8, 
and  further  cuts  on  magnitude  were  also  made;  hence,  not  all 
DR5  objects  are  included.  The  Photoz2  table  lists  a  photometric 
redshift,  an  error,  and  a  quality  flag.  For  objects  with  all  five 
SDSS  magnitudes  measured,  the  flag  is  set  to  0  if  r  <  20  or  2  if 
r  >  20;  photometric  redshifts  for  flag  =  2  objects  are  subject 
to  larger  uncertainties.  Objects  not  satisfying  the  above  condi¬ 
tions  have  their  flags  set  to  1  or  3,  and  their  photometric  red¬ 
shifts  should  not  be  used.  There  are  12.6  million  objects  in  the 
DR5  data  set  with  a  Photoz2  flag  of  0  and  another  59.0  million 
with  a  flag  of  2.  In  the  validation  set,  68%  of  flag  =  0  galaxies 
have  photometric  redshift  within  0.026  of  the  measured  spectro¬ 
scopic  redshift  (in  the  range  0.001  <  z  <  1.5).  Therms  dispersion 
between  photometric  and  spectroscopic  redshifts  is  higher, 
cr  =  0.039,  a  consequence  of  the  non-Gaussian  tails  of  the  error 
distribution. 

Figure  6  plots  the  two  photometric  redshift  estimates  against 
spectroscopic  redshifts  and  against  each  other  for  20,000  objects 
selected  from  the  DR5  database.  These  are  objects  with  SDSS 
spectroscopic  redshifts  and  that  are  spectroscopically  classified  as 
galaxies,  with  PhotoZ  quality  flag  of  4  or  5  and  PhotoZ2  flag  of 
0  or  2.  Both  estimates  show  a  tight  correlation  with  spectroscopic 
redshift  for  the  great  majority  of  sources,  while  PhotoZ  shows  a 
somewhat  larger  fraction  of  outliers  with  overestimated  photo¬ 
metric  redshifts. 

4.2.  Regions  and  Sectors 

Each  survey  observation,  imaging  or  spectroscopic,  covers  a 
certain  region  of  the  sky.  Doing  statistical  calculations  with  the 
SDSS  data  usually  requires  performing  computations  over  these 
regions  and  the  intersections  among  them,  e.g.,  to  normalize 
luminosity  functions  or  calculate  completeness  corrections.  Typ¬ 
ical  questions  are  how  much  area  do  these  regions  cover?  how 
much  do  they  overlap?  and  which  regions  contain  a  certain  point 
or  area  of  the  sky?  The  DR5  CAS  includes  tables  that  precisely 
describe  each  region  and  built-in  tools  for  finding  the  connec¬ 
tions  and  overlaps  between  one  kind  of  region  and  another.  Each 
Region  in  the  CAS  is  represented  as  a  union  of  spherical  pol¬ 
ygons,  and  its  area  is  analytically  calculated  and  stored. 

The  SDSS  has  many  different  types  of  regions;  they  include 
the  stripes,  camera  columns,  segments,  chunks,  and  spectroscopic 
tiles  that  are  the  basis  of  the  SDSS  observing  and  target  selection 
strategy.  The  survey  stripes  overlap  at  the  edges,  with  the  overlap 
increasing  toward  the  survey  poles,  so  they  are  clipped  into  disjoint 
“staves”  centered  on  each  stripe  that  uniquely  cover  the  survey 
area  ( like  the  staves  of  a  barrel).  The  union  of  the  staves  within  the 
survey  boundaries  defines  the  survey’s  “primary”  photometric  area. 
There  are  “holes”  inside  the  stripes  and  staves,  consisting  of  fields 
that  were  declared  to  be  of  inferior  quality  (e.g.,  because  of  degraded 
seeing  or  contamination  by  the  saturated  pixels  of  a  bright  star 
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Fig.  6. — Comparison  of  photometric  reds  hi  ft  estimates  PhotoZ  and  PhotoZ2 
to  SDSS  spectroscopic  redshifts  and  to  each  other. 

and  its  wings).  The  portions  of  these  holes  that  lie  within  the  pri¬ 
mary  survey  area  are  called  TiHoles  to  emphasize  their  role  in 
the  tiling  process,  as  explained  below. 

As  a  simple  example  of  the  region  tables,  let  us  calculate  the 
photometric  survey  area.  Imaging  data  are  imported  to  the  data¬ 


base  in  “chunks,”  and  the  total  area  of  these  chunks  can  be  ob¬ 
tained  from  the  SQL  (Structured  Query  Language)  query81 

select  sum(area)  from  Region  where  type  =  'CHUNK1, 

yielding  9560  deg2.  However,  this  counts  overlapping  areas  more 
than  once.  To  obtain  the  unique  survey  imaging  footprint,  we 
select  only  the  “primary”  region,  the  intersection  of  the  chunks 
with  the  staves, 

select  sum(area)  from  Region  where  type  =  'PRIMARY1, 

yielding  7897  deg2.  The  total  area  and  unique  footprint  area 
should  be  adjusted  downward  by  the  area  of  the  holes,  obtained 
from  the  queries 

select  sum(area)  from  Region  where  type  =  'HOLE1, 
for  the  chunks  and 

select  sum(area)  from  Region  where  type  =  'TIHDLE', 

for  the  primary  area.  These  queries  yield  26  and  23  deg2,  re¬ 
spectively,  making  the  final  precise  numbers  for  the  photometric 
survey  area  9534  deg2  in  total  and  7875  unique  deg2  within  the 
main  survey  boundaries.  (The  8000  deg2  figure  quoted  elsewhere 
includes  a  small  amount  of  imaging  outside  of  the  ellipse  that 
defines  the  main-survey  boundary.) 

For  analyses  of  spectroscopic  samples,  the  issues  are  more 
complex.  The  SDSS  spectroscopic  survey  aims  to  sample  quasars 
and  galaxies  uniformly  over  the  sky,  with  additional  spectra  for 
other  samples  (not  necessarily  uniform)  of  science  targets,  cali¬ 
bration  objects,  and  sky.  In  practice,  after  an  area  has  been  ob¬ 
served  by  the  photometric  survey,  a  series  of  targeting  pipelines 
creates  lists  of  targets  that  satisfy  the  selection  criteria.  A  “tiling” 
program  ( Blanton  et  al.  2003)  runs  over  a  subset  of  the  observed 
area  and  assigns  targets  to  circular  “tiles”  of  diameter  2.98°;  it 
also  determines  which  targets  are  assigned  fiber  holes  on  which 
spectroscopic  plugplate,  imposing  physical  constraints  such  as 
the  55"  minimum  spacing  between  fibers.  A  given  run  of  the 
tiling  program  operates  on  the  union  of  a  set  of  “rectangular”  (in 
spherical  coordinates)  TilingGeometry  areas. 

For  calculations  of  galaxy  or  quasar  clustering,  one  needs  to 
compute  the  completeness  of  the  spectroscopic  sample  as  a  func¬ 
tion  of  sky  position.  The  natural  scale  on  which  to  do  this  is  that 
of  a  SECTOR,  a  region  that  is  covered  by  a  unique  set  of  Tile 
overlaps  (e.g.,  by  a  particular  spectroscopic  plate  or  by  two  or 
more  plates  that  overlap).  These  are  regions  over  which  the  com¬ 
pleteness  should  be  nearly  uniform  (see,  e.g.,  Fig.  1  of  Percival 
et  al.  [2007]  and  earlier  discussions  by  Tegmark  et  al.  [2004]  and 
Blanton  et  al.  [2005]).  The  Target  table  lists  (in  the  column 
target .  regionID)  the  SECTOR  for  every  object  selected  by  the 
spectroscopic  target  selection  algorithms,  regardless  of  whether 
or  not  that  object  has  been  spectroscopically  observed.  To  find 
the  SECTOR  for  an  object  in  the  main  table  of  spectroscopi¬ 
cally  observed  objects,  SpecObj,  one  must  first  identify  the 

81  See  http://cas.sdss.org/dr5/en/help/docs/sql_help.asp.  The  text  follows 
our  standard  capitalization  conventions;  for  example,  the  various  types  of  entries 
in  the  Region  table  (CHUNK,  TILE,  etc.)  are  listed  in  all  capital  letters.  However, 
queries  are  not  case  sensitive. 


No.  2,  2007 


SDSS  DR5 


643 


corresponding  entry  in  the  Target  table.  For  example,  the  fol¬ 
lowing  query 

select  top  10  s.specObjID,  t.regionID 
from  SpecObj  s  join  Target  t 
on  s.targetID  =  t.targetID 

returns  the  spectroscopic  ID  numbers  and  the  SECTOR  numbers 
of  the  first  1 0  objects  encountered  in  the  SpecObj  table.  The  data¬ 
base  function  fRegionsContainingPointEQ  can  be  used  to 
find  the  SECTOR  that  covers  a  specified  point  on  the  sky. 

The  following  practical  example  illustrates  several  other  fea¬ 
tures  of  these  tables.  The  SDSS  quasar  target  selection  algo¬ 
rithm  underwent  significant  changes  in  the  early  phases  of  the 
survey,  reaching  its  final  form  (Richards  et  al.  2002)  with 
targetVersion  3.1.0,  following  DR1.  A  calculation  of  the 
quasar  luminosity  function  should  therefore  be  restricted  to  re¬ 
gions  targeted  with  this  or  subsequent  versions  of  the  target  se¬ 
lection  code,  and  it  should  be  normalized  using  the  corresponding 
area,  which  the  following  query  shows  to  be  4013  deg2: 

select  sum(area) 
from  Region 
where  regionID  in  ( 
select  b.boxID 

from  Region2Box  b  join  TilingGeometry  g 
on  b.id  =  gtilingGeometrylD 
where  b.boxType  =  'SECTOR7 
and  b.regionType  =  'TIPRIMARY7 
group  by  b.boxID 

having  min(g. targetVersion)  >=  ,v3_l_0/ 

) 

This  query  uses  the  Region2Box  table,  which  maps  between 
various  types  ofRegions  and  the  TilingGeometries  in  which 
information  about  the  target  selection  is  stored.  The  where  clause 
selects,  from  the  table  of  all  Regions,  those  which  are  SECTORS  in 
the  primary  tiled  area  and  were  targeted  with  a  final  version  of  the 
quasar  target  selection  algorithm.82 

In  principle,  these  tables  provide  all  the  information  needed 
for  complex  clustering  calculations,  e.g.,  determining  local  com¬ 
pleteness  corrections,  generating  appropriate  catalogs  of  randomly 
distributed  points,  and  identifying  targeted  objects  that  were  not 
observed  because  of  the  minimum  fiber  spacing  constraint.  The 
queries  required  for  such  calculations  are  rather  lengthy  and  will 
be  presented  and  documented  elsewhere. 

4.3.  Match  Tables 

About  50  million  photometric  objects  in  the  CAS  lie  in  regions 
that  have  been  observed  more  than  once,  because  of  stripe  overlap 
or  repeat  scans.  These  repeat  observations  can  be  used  to  detect 
variable  and  moving  objects.  The  MatchHead  and  Match  tables 
of  the  DR5  CAS  provide  convenient  tools  to  examine  the  multiple 
observations  of  a  single  object,  identified  by  positional  matches 

82  This  query  is  included  as  one  of  the  sample  queries  in  the  DR5  documen¬ 
tation,  under  “Uniform  Quasar  Sample,”  together  with  a  longer  query  that  shows 
how  to  extract  all  quasars  and  quasar  candidates  from  the  corresponding  sky  area. 


with  a  1"  tolerance  and  collectively  referred  to  as  a  bundle.  The 
MatchHead  table  has  the  unique  ID  of  the  first  object  in  the  bundle 
(defined  by  observation  date),  the  mean  and  variance  of  the  coor¬ 
dinates  of  all  matched  detections,  the  number  of  matched  detec¬ 
tions,  and  the  number  of  times  the  object  was  “missed”  in  other 
observations  of  the  same  sky  area.  Misses  can  occur  because  the 
object  is  variable,  because  it  is  moving,  because  inferior  seeing 
moves  it  below  the  detection  threshold,  or  because  the  original 
detection  was  spurious.  The  Match  table  lists  all  objects  in  each 
bundle. 

As  an  example,  the  following  query  lists  information  about 
the  multiple  detections  of  an  object  at  (ra,  dec)=(194,  0): 

select  MH.* 
from  MatchHead  MH 

join  f  GetNearby0bjEq(l 94,0,0.3)  Non  MH.obj  ID  =  N.objID. 

The  f  GetNearbyOb  jEq  function  returns  a  table  (assigned  the 
name  N)  of  all  objects  found  within  0.3'  of  the  desired  coordi¬ 
nates.  The  select  command  returns  all  entries  in  the  mat  chHead 
table  (assigned  the  name  MH)  that,  as  a  result  of  the  join  com¬ 
mand,  have  an  object  ID  that  matches  one  returned  by  the  neigh¬ 
borhood  search.  In  this  case,  there  is  just  one  such  match  and, 
hence,  a  single  bundle.  One  can  get  information  on  all  the  objects 
in  the  bundle  with  the  query 

select  M.* 
from  Match  M 

join  MatchHead  MH  on  M.matchHead  =  MH.obj  ID 

join  f  GetNearby0bjEq(194,0,0.3)  Non  MH.obj  ID  =  N.obj  ID, 

where  the  new  join  command  selects  out  those  Match  ta¬ 
bles  whose  mat  chHead  agrees  with  that  returned  by  the  earlier 
query. 

The  DR5  CAS  has  50,627,023  bundles  described  by  MatchHead 
and  109,441,410  objects  in  the  Match  table.  When  an  object  is 
undetected  in  a  repeat  observation  of  the  same  area  of  sky,  a  sur¬ 
rogate  object  is  placed  in  the  Match  table  but  marked  as  a  “miss,” 
with  an  additional  flag  to  indicate  if  the  miss  could  be  caused  by 
masking  of  the  region  in  the  second  observation  (e.g.,  because 
of  a  satellite  trail  or  cosmic-ray  hit)  or  because  it  lies  near  the 
edge  of  the  overlap  region.  A  bundle  may  therefore  consist  of  a 
single  detection  and  one  or  more  surrogates  (and  the  object  in 
the  MatchHead  may  be  a  surrogate).  There  are  9.8  million  sur¬ 
rogates  in  the  Match  table.  The  presence  of  surrogate  objects  may 
simplify  algorithmic  searches  for  moving  or  variable  objects. 

Because  the  multiple  imaging  scans  of  the  southern  equatorial 
stripe  are  not  yet  in  the  CAS,  the  Match  tables  cannot  be  used  to 
search  for  moving  or  variable  objects  in  these  data.  However,  this 
capability  will  be  present  in  future  data  releases. 

5.  CONCLUSIONS 

The  Fifth  Data  Release  of  the  Sloan  Digital  Sky  Survey  provides 
access  to  8000  deg2  of  five-band  imaging  data  and  over  one  million 
spectra.  These  data  represent  a  roughly  20%  increase  over  the 
previous  data  release  (DR4;  Adelman-McCarthy  et  al.  2006).  Both 
the  catalog  data  and  the  source  imaging  data  are  available  via  the 
Internet.  All  the  data  products  have  been  consistently  processed  by 
the  same  set  of  pipelines  across  several  data  releases.  The  previous 
data  releases  remain  online  and  unchanged  to  support  ongoing 
science  studies.  DR5  includes  several  qualitatively  new  features: 
multiple  imaging  scans  of  the  southern  equatorial  stripe,  special 
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imaging  scans  of  M3 1  and  the  Perseus  Cluster,  database  access 
to  QSO  catalogs  and  galaxy  photometric  redshifts,  and  database 
tools  for  precisely  defining  the  survey  geometry  and  for  link¬ 
ing  repeat  imaging  observations  of  matched  objects.  More  than 
a  thousand  scientific  publications  have  been  based  on  the  SDSS 
data  to  date,  spanning  an  enormous  range  of  subjects.  Future  data 
releases  will  increase  the  survey  area,  and  they  will  provide  qual¬ 
itatively  new  kinds  of  data  on  the  stellar  kinematics  and  populations 
of  the  Milky  Way  and  on  Type  la  supemovae  and  other  transient 
or  variable  phenomena,  further  extending  this  scientific  impact. 


We  dedicate  this  paper  to  our  colleague  Jim  Gray,  who  dis¬ 
appeared  in  2007  January,  while  sailing  near  San  Francisco.  Jim 
dedicated  an  enormous  amount  of  his  time,  his  energy,  and  his  re¬ 
markable  talents  to  the  SDSS  over  the  course  of  many  years.  He 
played  a  critical  role  in  the  development  of  the  SDSS  database, 
including  important  conttibutions  to  the  writing  of  this  paper. 
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